Learning Mixtures of Localized Rules by Maximizing the Area Under the ROC Curve

نویسندگان

  • Tobias Sing
  • Niko Beerenwinkel
  • Thomas Lengauer
چکیده

We introduce a model class for statistical learning which is based on mixtures of propositional rules. In our mixture model, the weight of a rule is not uniform over the entire instance space. Rather, it depends on the instance at hand. This is motivated by applications in molecular biology, where it is frequently observed that the effect of a particular mutational pattern depends on the genetic background in which it occurs. We assume in our model that the effect of a given pattern of mutations will be very similar only among sequences that are also highly similar to each other. On the other hand, a pattern might have very different effects in different genetic backgrounds. Model inference consists of repeated iteration through a sequence of three steps: First, a new rule is mined from a resampled data set using the apriori algorithm. Next, the localization information for the rule is computed. Finally, the weights of all rules in the mixture model are re-optimized simultaneously. This weight optimization is done using the area under the ROC curve rather than the error rate as the objective function. Correspondingly, the weight of a sample in the resampling procedure is based on the rank of the sample relative to the other samples rather than directly on the score itself (such as in boosting). This strategy can be seen as an adaptation of boosting to the case of AUC optimization. Finally, we apply our method to the problem of predicting HIV-1 coreceptor usage from the amino acid sequence of the viral surface protein.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Maximizing the Area under the ROC Curve using Incremental Reduced Error Pruning

The use of incremental reduced error pruning for maximizing the area under the ROC curve (AUC) instead of accuracy is investigated. A commonly used accuracy-based exclusion criterion is shown to include rules that result in concave ROC curves as well as to exclude rules that result in convex ROC curves. A previously proposed exclusion criterion for unordered rule sets, based on the lift, is on ...

متن کامل

Maximizing the Area under the ROC Curve with Decision Lists and Rule Sets

Decision lists (or ordered rule sets) have two attractive properties compared to unordered rule sets: they require a simpler classification procedure and they allow for a more compact representation. However, it is an open question what effect these properties have on the area under the ROC curve (AUC). Two ways of forming decision lists are considered in this study: by generating a sequence of...

متن کامل

zoning of flood hazard in Nowshahr city using machine learning models

  The aim of this study is to predict and model flood hazard in the city of Nowshahr, Mazandaran province using machine learning models. The criteria and indicators affecting flood hazard were identified based on the review of resources, and then the indicators were converted into rasters in ArcGIS environment, and finally standardized by fuzzy method for use in the models. K-nearest neighbor ...

متن کامل

Risk Estimation by Maximizing the Area under ROC Curve

Risks exist in many different domains; medical diagnoses, financial markets, fraud detection and insurance policies are some examples. Various risk measures and risk estimation systems have hitherto been proposed and this paper suggests a new risk estimation method. Risk estimation by maximizing the area under a receiver operating characteristics (ROC) curve (REMARC) defines risk estimation as ...

متن کامل

ROCCER: A ROC convex hull rule learning algorithm

In this paper we propose a method to construct rule sets that have a convex hull in ROC space. We introduce a rule selection algorithm called ROCCER, which operates by selecting rules from a larger set of rules in order to optimise Area Under the ROC Curve (AUC). Compared with set covering algorithms, our method is less dependent on the previously induced rules. Experimental results on three UC...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004